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Abstract 

This works explores and illustrates recent results developed by the author in field of dynamical network 
analysis [121 1211 1131 1201 1221 1111 1191 110] . The considered approach is blind, i.e., no a priori assumptions 
on the interconnected systems are available. Moreover, the perspective is that of a simple "observer" who 
can perform no kind of test on the network in order to study the related response, that is no action or 
forcing input aimed to reveal particular responses of the system can be performed. In such a scenario 
a frequency based method of investigation is developed to obtain useful insights on the network. The 
XJ*± . information thus derived can be fruitfully exploited to build acyclic graphical models, which can be seen 

as extension of Bayesian Networks or Markov Chains. Moreover, it is shown that the topology of polytree 
linear networks can be exactly identified via the same mathematical tools. In this respect, it is worth 
observing that important real systems, such as all the transportation networks, fit this class. 

^ ! 1 Introduction 
^ . 1.1 Problem background 



In the last decade the analysis and the study of networks has become ubiquitous. In particular, in the 
study of complex systems the comprehension of the internal connections among the elementary units, which 
define the underlying structure of the process, turns out to play a key role to fully understand its principal 
mechanisms. While networks of dynamical systems have been deeply studied and analysed in systems theory 
(see |27[ [TBI [T7] and references within for a sample of recent studies) , a great need still exists for methods 
to infer their structure and their topological characteristics [53] • Indeed, in most engineering scenarios the 
network is given or it is the very objective of design. On the other hand, dynamical networks represent a 
powerful tool to model large scale systems, dividing the problem into a set of reduced complexity subsystems 
to be designed. In this scenario inferring a suitable internal topology is of fundamental importance. In 
other situations the structure of the network can not be a priori known, but its agents can be designed 
to identify what their neighbours are, to interact with them following a distributed strategy and, more 
generally, to establish any sort of consensus [22]. In this case, typical examples are given by sensor networks 
or cooperative controllers [25] • On the other hand, there are also interesting situations where the agents are 
not directly manipulable, the link structure is actually unknown and it is important to reconstruct it along 
with the underlying dynamics. 

In this perspective, Graphical Models are a marriage between probability theory and graph theory. They 
provide a natural tool for dealing with two problems that occur throughout applied mathematics and engi- 
neering - uncertainty and complexity - and in particular they are playing an increasingly important role in 
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the design and analysis of Machine Learning algorithms and Neural Networks. Fundamental to the idea of 
a graphical model is the notion of modularity, i.e., a complex systems turn out by combining simpler parts. 
Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is 
consistent, and providing ways to interface models to data. Many of the classical multivariate probabilistic 
systems studied in fields such as Statistics, Systems Engineering, Information theory, Pattern Recognition 
and Statistical Mechanics are special cases of the general graphical model formalism; examples include mix- 
ture models, factor analysis, hidden Markov models, Kalman filters and Ising models. The graphical model 
framework provides a way to view all of these systems as instances of a common underlying formalism. 

Relevant examples about the importance of the connection structure in defining the properties of a system 
can be observed in different fields, such as Economics [TB], Biology [HUE], Ecology [33] an d Neural Sciences [2]. 
We commonly find this kind of problems in Biological Neural Networks [2] , Biochemical Metabolic Pathways 
[8], Ecology [33] and also Financial Markets (especially when trade is happening at a high frequency) [241132) . 
In all these cases very little is known about the network topology and the dynamics among the elementary 
agents. Hence, even inferring some relation properties between two different nodes constitutes a precious 
information [5]. 

To the best knowledge of the author, there are only few theoretical or methodological results about the 
reconstruction of an unknown topology related to a set of networked dynamical systems when the agents are 
not manipulable. Recently, this challenging problem has started drawing the attention of many researchers, 
especially in the physics community. Interesting and novel results are given in [3T] , [53J and [5] , even though 
they are not systematic and do not provide theoretical guarantees about the correctness of the identified links 
and the related dynamics. In particular, in [31| . |23) and [5] both theoretical and numerical procedures to 
detect the active links inside a dynamical networks are developed for specific classes of models. In [23] the 
attention is focused on sparse networks affected by noise and the aim is obtaining a final topology with a 
reduced number of links. No guarantee of an exact identification is provided. In [31] a more general class of 
networks is addressed, but the proposed method to identify their topology is based on the assumption that 
the input of every single node can be manipulated and that it is possible to perform many experiments to 
detect the adjacent links. Unfortunately, such an hypothesis is not feasible in most real-world situations and 
it is not practical for large networks. In[S] a generalization of Granger causality is introduced to quantify the 
amount of shared information among times series. 

Even though the above mentioned works can be settled in the problem of reconstructing the internal con- 
nections of a dynamical network, these approaches are not yet systematic and they do not provide theoretical 
guarantees about the correctness of the identified links and, possibly, of the corresponding dynamics. 

In this paper, the results developed in the recent years by the author in the above field are presented and 
summarized. A short introduction of the reference scenario and the main assumption and ideas are reported 
hereafter. 

1.2 Blind frequency approach 

In the following a method to infer a simplified connected graph, based on standard tools from identification 
and filtering theory [141 [I], is presented. The proposed approach is blind, in the sense that no a priori 
knowledge about the original network topology is given. Moreover, every observed process is associated with 
a distinct agent and only their outputs can be measured. Furthermore, no action can be performed to alter 
the network behaviour in order to test its response to external perturbations. Then, we assume to measure 
the scalar outputs {Xj}j = i : „^N of a network system composed of many interacting agents without knowing 
what the internal interconnections are and without any possibility to alter the functioning via a sort of testing 
input. 

The main idea is to estimate each signal by using the others as the inputs of a dynamical system to be 
identified. Indeed, the corresponding models provide a description of the dependencies among the processes 
and of the internal dynamics of the whole system, as well. Moreover, for each agent, every process acting 
as input in its dynamical model can be associated to a link in the reconstructed network topology. Then, 
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the optimal solution in terms of the minimization of the modeling error would be given by an a priori large 
number of dynamical models, namely one for every possible link. However, this approach is not useful because 
of both the a priori large number of model to be identified and the lack of proper distinctions among the 
links. 

The proposed method, instead, aims to reconstruct a sparse link structure, that is, to select only a limited 
number of chosen inputs. Such a purpose is motivated by the need to provide a final model able to provide 
immediate information about the network structure by detecting the "most important" links. This approach 
is suggested by the observation that in a large network one could expect that most links could be neglected 
without affecting in a significant way the quality of the identification. Consequently, suboptimal strategies 
not only need to be taken into account, but can also be more informative to detect the internal mechanisms of 
interdependence among the signals. Indeed, one may obtain a better understanding of the network topology 
by forcing a proper selection of a reduced number of links. 

To this aim, we show how a simple suboptimal strategy can be employed to provide a graphical model 
of the network with a reduced complexity, where the links associated to each agent could also be used to 
single out a selection of inputs for its dynamical model. The main goal is to obtain a simplified model of 
the network, which can be used to reveal some insights about its internal structure. Therefore, given a set 
of N signals {X,}.,^...^, the problem of identifying, for each of them, the rrij signals providing the "best" 
estimate is defined. This optimization problem is combinatorially intractable and it can be approached only 
for small values of rrij . In this paper, however, it is shown that solving the problem in the simplest case, where 
rrij = I for any j, provides information about the connection structure of the processes. Such a result also 
suggests to exploit the corresponding topology to derive for each j a suboptimal set of rrij processes, possibly 
with rrij > I, in order to estimate via multivariate linear filtering the dynamics of the related Xj. We also 
examine some general properties of the network structure, which can be obtained in this special case, and 
we investigate how it can be employed as a useful tool to unveil the network structure gaining insights into 
its topology. Such an information not only provides a deeper understanding of the internal dynamics, but it 
also represents a useful tool in the realization of a suitable model of the whole system. A class of dynamical 
networks that can be completely and exactly disclosed via the proposed method is also present. Such a class 
turns out quite common and, for instance, it contains distribution networks. The application of the above 
approach to clustering problems will also be presented. 

1.3 Notation 

Let R and C respectively be the real and complex numbers sets. Then, for any z 6 C such that z = a + jf3, 
with a, /? G R, the complex conjugate of z is defined as z* = a — jj3. 

Let X(t) and Y(t) be time-discrete zero-mean wide-sense stationary processes. We denote by E[-\ the mean 
operator. It follows that E[X(t)] = E[Y(t)] = 0. Since the two processes are wide-sense stationary it 
is possible to define the cross-covariance and the covariance functions, Rxy{t) := E[X(t)Y(t + r)] and 
Rx(t) := Rxx(t), with no ambiguity since there is no dependence on t. Given a time-discrete sequence a(t), 
we also define 

Z[a(.)](z) := «(*)*"' 

t— — oo 

as the bilateral Zeta-transform of the signal a(t) for any z 6 C such that the above sum exists. Moreover, 
we denote by &xy(z) := Z[Rxy(')](z) the cross-power spectral density between X(t) and Y(t) and by 
<&x{z) ■= ^xx(z) the power spectral density of X(t) assuming that the bilateral Zeta-transform converges 
in a neighborhood of the unit circle \z\ = 1. Finally, let (f>x{u) = ^xie 1 ^) and also let 

{Z[a(.)](z)} c :=5>(t)z-* 
t=o 
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be the causal truncation of a filter or a signal. 



2 Problem set up 

Proofs and mathematical details are omitted for the sake of simplicity. See |12l I2T1 [TU1 [XT] for a further 
reading. 

2.1 Mathematical tools 

In this section some basic notions of graph theory, which are functional to the following developments are 
recalled as well as the main mathematical tools. For an extensive overview on graph theory see [7j- First we 
provide the standard definition of a graph. 

Definition 1. An indirected (or undirected) graph G is a pair (V,A) comprising a finite set V of vertexes 
(or nodes) together with a set A of edges (or arcs), which are unordered subsets of two distinct elements of 
V . A directed graph differs from the indirect type, because the edges are ordered couple of nodes, the first one 
being denoted as the departing node of the arc and the second the arriving vertex. The order of the couple 
define the direction of the edge. 

In a graph it can also be defined the notion of "path" linking two nodes as an ordered sequence of contiguous 
edges having them at the extremities. A path formed by coherently directed edges is said a directed path. 
The following definition introduces the notion of "tree". 

Definition 2. A graph T — (V,A) is a tree if for any pair of distinct nodes iVj, Nj G V there is a unique 
path linking them. 

The more standard definition of a tree as an acyclic and connected graph is equivalent to Definition [5] as 
shown in [7]. However, in the development of our results we will extensively employ the property of uniqueness 
of a path linking two nodes. It is also important to recall the notion of connected graph, that consists in the 
existence of a path between every couple of its nodes. 

Lemma 3. Given a connected graph G = (V, Aq), there exists a tree T — (V, At) such that At C Aq. The 
tree T is also said to be a spanning tree for G. 

The use of a weighting function on the edges of a graph allows one to define a weight for any of its 
"subgraphs" as the sum of the values associated to each of the corresponding arcs. 

Definition 4. Given a connected graph G — (V, A) and a weighting function w : A — > 5ft, we define Minimum 
Spanning Tree (MST) any spanning tree T of G such that 

w(T)<w(T') (I) 

for any other spanning tree T' of G. 

In this work directed graphs are also addressed and so the concepts of "rooted tree" and "polytree" are 
also introduced hereafter. 

Definition 5. A rooted tree is a directed graph T = (V,A) such that its edges defines a tree and they satisfy 
the condition that only a unique node N r £ V , referred to as the root, has exclusively departing edges. A 
polytree extends the rooted tree concept in the sense that the property of only departing edges is satisfied by 
a finite subset 1Z of the node set. 
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The presence of root nodes denotes a partial ordering among them in terms of couples of nodes linked 
by directed paths. For instance, if two vertexes are connected by a directed arc, the departing one is said 
a parent of the other, that in turn is denoted as a child of the first node. In the following the set of all 
the parents of the j-th node will be addressed as IT(j). The notions of ancestor and descendant extend and 
generalize the above situation. 

The previous definitions and lemmas are functional to introduce a rigorous model to address noisy linear 
dynamical systems interconnected to form a tree-like topology. 

Definition 6. Consider the triple T = (T, X, H) where 

• T = (V, A) is a polytree counting n nodes {Nj}j = \ n and k roots among them; 

• X = {Xi} i=1 is a set of n zero-mean wide-sense stationary and ergodic discrete stochastic processes; 

• % = {Hji(z)}j i=1 n is a set of n time-discrete SISO transfer functions Hji(z) represented in the 
domain of the Z -transform and functionally linking the output of the i-th node with the input of the 
j-th one; it is also assumed that Hji(z) — if and only if there is no connections between the i-th and 
the j-th system. 

For j = 1, ...,n, define the following stochastic processes 

f er .= x j if x , n 

\ Qj := Xj - J2ien(j) Hji( z )Xi if Nj is child of Ni for each i e Tl(j). ^ ' 

We say that T is a Acyclic Linear Network (ALN) if the following condition is satisfied 

E[QjQi] = when i ^ j. (3) 

It is worth underlining that a number of real world systems can be represented as ALNs despite the peculiar 
link structure of such a model. For instance, a special class of ALNs is represented by the transportation 
systems. A special subclass, in particular, consists in the power distribution network []. 



2.2 ALN perspective 

Let us assume that the observed system can be seen as a set of interconnected processes. Then, let us represent 
such a system as a set of N agents interacting together according to an unknown structure. Moreover, assume 
that it is possible to obtain scalar measurements from these agents in the form of N scalar time series. 
Furthermore, let it be possible to remove any deterministic component from these time series, in order to 
obtain N stochastic processes {Xi}t = i,.. jv which are wide sense stationary and with zero mean [30J. We 
intend to derive graphical model in term of a ALN representing the strongest interconnections among the 
processes in order to obtain a simplified representation of their unknown connectivity structure. This can 
be used as starting point for developing more complex topological descriptions, but the links of each agent, 
representing its strongest dependences, can also be exploited to design a suitable dynamical model for it. 

The approach we follow to single out the network connectivity is given in terms of filtering theory |14| . 
Given two stochastic processes Xj and Xj, we say that Xj does not contain more information that X{ if 
it is possible to fully determine Xj from the knowledge of Xj by the application of a suitable operator. 
This observation provides us with a tool to quantify the "relatedness" of two stochastic processes in the 
following way: they are "related", if it is possible to provide a "good" estimate of one (according to a specific 
criterion) by knowing the other, or viceversa. A special situation occurs when the operator mapping Xj 
into Xj is invertible. In this paper, we develop an approach intended to be used in situations where the 
operator, providing the best estimate of one process given the other, needs to be derived from the data itself 
(for example by using an identification or a match filter technique). In these cases, it is typical to adopt a 



5 



parametric approach by choosing a fixed class of possible models. In this paper we limit ourselves to the class 
of linear time-invariant operators mapping stationary processes into stationary ones and we consider a least 
square criterion for the optimization. Indeed, in such a case the solution of the best estimate is well-known 
and it is given by the Wiener filter. We examine two scenarios: non-causal and causal Wiener filtering. In 
the non-causal scenario the considered transformations are always invertible. This property will lead to the 
definition of a (pseudo)-metric on the set of stationary processes. In the causal scenario, instead, this property 
is lost, but the general approach to determine an acyclic connectivity structure can still be applied. 

According to the above formulation, we would like to develop a suitable graphical model describing the 
connectivity structure of the network. In particular, we intend to link each node to the most "related" 
ones, where the "relatedness" criterion comes from SISO Wiener filtering along with the acyclic topology 
requirement. We highlight that a similar and more general result can be obtained by modeling every single 
process Xj as the output of a MISO (Multiple Input Single Output) linear system generalizing (flUl) . 

X 3 (z) = e J (z) + Y / W Jl (z)X t (z), (4) 
ieij 

whose set of inputs are the processes {X T } reI ^ providing, according to a chosen criterion, the best 
trade-off between their number and the largest reduction of the cost function (|5|). However, such an approach 
relies on several combinatorial searches in order to point out the most appropriate input set Therefore, 
we rather aim to develop a suboptimal solution, exploiting the weight matrices defined in the previous section, 
to derive graphs, whose topologies could be possibly used to design the linear MISO models of each process. 
To this purpose, let us represent the process Xj 6 8 as the j-th node of the set V. Moreover, let us weight 
the edges connecting the N elements of V according to the matrix Wy G R NxN . Moreover, consider a set 
A(W) C V x V of (un)directed arcs connecting nodes of V and denote by Q = Q(V, A(W)) the corresponding 
(un)directed graph. Hence, if the nodes, which are closer in the graph to the j-th one, are chosen so that they 
represent the processes having the strongest relations with Xj, then the input set J(j) of a possible MIMO 
model of Xj can be constructed starting from its neighbours. 

2.3 Preliminary results 

First, we need to introduce a few preliminary results which will be exploited to define a mathematical tool 
for the quantitative characterization of the connections among the systems of the network. The main idea 
developed in this section is to define a distance among time series in order to establish a useful concept of 
"closeness". Specifically, we will assume a process Xi as the input of a linear system Wji(Z) whose output will 
be used to estimate Xj . The two processes will be considered "close" if it is possible to find a linear transfer 
function Wji which provides a "good" estimate of Xj. In this respect, our approach will rely on standard 
least squares techniques, namely Wiener filtering. The distance function will be defined as the variance of 
the suitably filtered error signal in order to guarantee the properties of a metric. In particular, we distinguish 
two cases, which we refer to as the "causal" and the "non-causal" scenario. We are treating these two cases 
separately, because they will lead to two different kinds of graphs. 

2.3.1 Non-causal scenario 

First, for the sake of completeness, we provide a formulation of the Wiener filter that will be useful in the 
development of our method. 

Proposition 7 (Non-causal Wiener filter). Given a frequency weighting transfer function Q(z) analytic on 
the unit circle \z\ = 1, a zero-mean wide-sense stationary scalar process Y and a zero-mean wide-sense 
stationary vector process Z, the Wiener filter modeling Y by Z is the linear stable filter W(z) minimizing the 
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variance of the error Sq := Q(z)[Y — W(z)Z], that is 

W(z) = axgwmE[e%]. (5) 

W(z) W 

Its expression is given by 

W(z) = <P YZ (z)<i> z 1 (z) (6) 
and it does not depend upon Q{z). Moreover, the minimized cost is equal to 

mini? [s 2 Q ] = ±- / |Q(e^)| 2 ($ y (w) - $^(^)$ z 1 ( W )$ yz ( W )) dw. 

Given two scalar stochastic processes JQ, X,-, let Wji(z) be the Wiener filter modeling X, from Xj. As a 
consequence of Proposition [7] we have that, for any weighting transfer function Qj(z) 



E 



[QiWXj-WjiWXi)]*] = |* \Qj(en\ 2 ~ '^ff (7) 



By choosing Qj(z) equal to the inverse of the spectral factor of $x (z), that is the stable and causally 
invertible filter Fj(z), such that |29) : 

^(z^^zXFT^))*, (8) 

we obtain 

. pr 2 1 r/, i^mi 2 \ 

This special choice of Qj(z) makes the cost depend explicitly on the coherence function of the two processes 
HU: 

which is non-negative and symmetric with respect to uj. It is also well-known that the Cross-Spectral Density 
satisfies the Schwartz Inequality. Hence, the coherence function is limited between and 1. The choice 
Qj{z) = Fj(z) can be now understood as motivated by the necessity to achieve a dimensionless cost function 
not depending on the power of the signals. 

Let us consider, now, a set of discrete zero mean and wide sense stationary stochastic processes. For the 
sake of simplicity, hereafter we will neglect the signal argument t or z, when it can't be misunderstood. The 
cost obtained by the minimization of the error ep. using the Wiener filter as before allows us to define on 
the function 



d\ .V,..V ; i : 



1 

2^ 



(1 - C XiXj (u,)) du 



1/2 



vxajee. (ii) 



Proposition 8. The function d(-, ■) as defined in ill]) is a pseudo-metric. 

Remark 9. Observe that pip assumes values between and 1. The first case happens only when C'xiXj (w) = 
1 for any uj; the second situation, instead, is related to the scenario where CxiXj (w) is always equal to 0, i.e., 
when the cross-spectral density between Xj and Xj never assumes values different from zero. 

For a given set counting N processes we also introduce the symmetric weight matrix Dq G M. Nx n such 
that 

D @ (j,i) ■= d(X s ,Xi), VXj.X 60. (12) 
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It is worth observing that D@(j,i) expresses how much good is the linear model of Xj (Xi) consisting of the 
only input process Xi (Xj). Therefore, the information recorded in the j-th raw (column) allows one to sort 
the elements of O according to their capacity of describing Xj in term of the linear SISO (Single Input Single 
Output) model 

Xj = ej + Wji [z)Xi , i= 1,...,N. (13) 
In particular, it holds that D@(j,j) = Vj = 1, . . . , N. 



2.3.2 Causal scenario 

Given two stochastic processes JQ, Xj and a transfer function Wji(z), let us consider again the quadratic 
cost £q := Q(z)[Xj — Wji(z)Xi\. Analogously to the previous case it is possible to derive a causal linear filter 
minimizing ([5]) . The causal filter providing such a minimization is referred to as the causal Wiener filter |14j . 

Proposition 10 (Causal Wiener filter). The Causal Wiener filter modeling Xj by Xi is the causal stable 
linear filter W^(z) minimizing the filtered guantity EQ j := Qj(z)[Xj — Wji(z)Xi]. It does not depend upon 
Qj(z) and its expression is given by 



H $M = ^M|F,(=)-^j c , ,14) 

where Fj(z) is the inverse of the spectral factor of <&xAz). 

Since the weighting function Qj(z) does not affect the Wiener filter, but only the energy of the filtered 
error, we can choose again Qj(z) equal to Fj(z), the inverse of the spectral factor of ^jiz). This choice 
simply operates a normalization of the error spectrum at every frequency. Thus, similarly to what we have 
done in the non-causal framework, we can define the function 



d c (X 3 ,Xi) :=\E 



{Fj{z){Xj-WP j {z)X i ))' 



(15) 



to represent the modeling error on the process Xj due to the application of the causal Wiener filter to 
Xi. To highlight the properties of the present approach, it is important to notice a difference between dc 
and the function d defined by (|11D . Indeed, it is straightforward to observe that dc is not symmetric and, 
therefore, that it can't be a metric, providing us with a less general tool to handle the processes. However, 
dc still represents a modeling error and in this respect it well suits the original problem of quantifying the 
"relatedness" between two processes. 

As in the previous case, we introduce for a given set 8 counting N processes the weight matrix Dq G M. NxN 
such that 

D%(j, i) := d c (Xj, Xi), VX, , X, G 6 . (16) 

It is worth observing that -D@(i, i) still represents the capacity of Xi in describing Xj according to the SISO 
model (fTB")) . but only via causal filters. 



3 Graphical models 
3.1 ALN formulation 

According to the above introduced scenario, let us depict each stochastic process Xj(t) as the superposition 
of linear dynamic transformations of the other processes' outputs: 

JV 

Xj(z) = ej(z)+ Yl WjityXiiz), (17) 
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where Wji(z) is a suitable transfer function and ej is the model error. In this framework, it can be considered 
interesting to find the set of {Wji(z)}ij—i,.,N,ijtj which allows us to best describe the time series according 
to the least squares criterion 

min£>[4(*)] , (18) 

3 

where £j(z) = Qj(z)ej(z) and the Qj(z) are dynamic weight functions, i.e. transfer functions. The best 
description provided by (|17[) according to (|18[) could be exploited in order to infer dynamic relations between 
the random processes. The problem with such an approach is due to the complexity of the final model, since 
it may be given by an a priori large number of transfer functions, namely N(N — 1). Hence, it is quite natural 
to develop a suboptimal strategy to reduce its complexity. In particular, this boils down to the problem of 
choosing for every j = 1, . . . , N a set I (J) C {1, . . . , j — 1, j ; + 1, • • • , n}, in order to satisfy a certain trade-off 
between the number of elements in each and the global cost (|18p associated to the modeling errors 

e j (z)=X j (z)- £ W jT (z)X T (z). (19) 
rei(j) 

The simplified linear model (fT!?]) represents a suboptimal strategy with respect to considering all the possible 
contributions present in (|17l) . Nonetheless, if no a priori assumptions are introduced, the choice of the most 
suited sets may turn out a complex procedure, since several combinatorial searches are needed. Therefore, 
in the following we develop a simplified approach inspired by graphical models, such as Bayesian Networks 
(BN) and Markov Random Fields (MRF). These models consist of interconnected nodes representing random 
variables, which are employed to provide a statistical description of the system. In particular, in both cases 
the inference problem associated to each node can be solved by the knowledge of a set of conditionally related 
random variables, which are depicted as the adjacent nodes (MRF) or as a proper extension of this set (BN). 
Similarly, we intend to design a procedure to build from the orginal set of processes a graph, such that its 
topology has a direct interpretation in terms of the sets 

3.2 Undirected graph 

Undirected arcs are well suited to represent mutual relationships, where one subject affects an other and 
viceversa. In this perspective, between two nodes j and i there exists only one edge and, then, the matrix 
weighting the arcs results symmetric. We choose Wy equal to D@ as in (fl2|) . so that the weight of each edge 
is a direct expression of the modeling capability between the connected nodes according to the (fT5)) paradigm. 
Then, assume that, according to the above Wy, the i-th node is the best neighbor of the j-th, while the fc-th 
is the best connection for the z-th. Observe that the weight DQ(i,k) is the minimum among all the values 
{Dq(i, r)}, r = 1, . . . , i — 1, i + 1, . . . , N, while Dg,(i, j) may be ranked in any other position. However, even 
though the j-th node may not be the most suited one to represent by itself the i-th, the fact, that the i-th 
is the best choice to describe with a single process the j-th, is meaningful. In particular, we can read this 
information as the fact that the j-th node shares with the j-th a certain amount of information, which no 
other process can model. This idea derive from a common observation in Informatics: if two different signals 
are able to represent the 90% of another process, but they do describe exactly the same 90%, then there is 
no improvement in using them both to model it. Conversely, it would be more useful to find out a signal 
describing, for instance, only a 20% of the objective process, if it can explain some missing information from 
the first chosen signal. In such a scenario, we will take both processes Xj and Xh as inputs for a possible 
MISO model © of X t . 

Observe that the above reasoning implicitly assumes that, from the modeling point of view, the amount 
of shared information between two processes decreases as the length of the (weighted) path linking them 
increases. Figuratively, this property can be regarded as though the presence of intermediate nodes in the 
path weakens the exchanged information. In this perspective, then, the influence of distant nodes reaches 
the objective process as filtered by neighbors of this latter. Due to this fact, we want to design a connected 
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graph, since without any a priori information about the signals nature, the best practice is to assume that 
every process is able to affect all the others to a certain extent. Conversely, the choice of enforcing a unique 
path between every couple of nodes, that is of providing an acyclic graphical model, turns out reasonable in 
order to keep the complexity low, according to the suboptimal solution perspective. 

It is worth observing that a connected graph, such that for every couple of nodes it exists a unique path 
linking them and such that every node is directly attached to the best neighbour according to the weight 
matrix, is a Minimum Spanning Tree [7]. An alternative definition is the connected acyclic graph with the 
least total weight. Efficient algorithms exist for its computation once the weight matrix is given. 

3.3 Directed graph 

Directed arcs are best suited to deal with binary relationships which differ according to the chosen order of 
the objects. Therefore, since we are also concerned with causal dependencies among the processes, we are 
going to develop a directed graph, whose purpose is the same as the previous one, i.e. providing a reasonable 
description of the connectivity structure of the entire network. As in the previous case, moreover, the resulting 
graphical model can be possible exploited in order to single out a suboptimal solution for the inputs of the 
MISO model (g]) of each process. 

To this aim, recalling the previous approach, it is worth observing that Dq provides distinguished information 
for every couple of node, according to the choice of the input. However, in order to derive a directed graph, 
let us choose first Wy such that 



so that the weight of each edge describes the best capability of one of the linked nodes in describing the 
other according to the (|13l) paradigm, also taking into account the differences due to the causality property. 
Moreover, to keep track of the best input in each couple, let us define the matrix H(Wv, Dq) such that 



Similarly to the previous case, let us apply the Minimum Spanning Tree algorithm to the weight matrix 
Wy as defined in (f2"0")) . Then, choose for each edge (j, i) the direction from the i-th node to the j-th one, if 
%(Wy, -Dq)(j, i) = +1, and the opposite otherwise. The resulting graph is a polytree with minimum cost |15| . 
Analogously to the previous case, we use the arcs to represent a input /output relationships. In particular, 
observe that the direction of each edge defines which node would act as the input when modeling the processes 
via MISO linear representations. Hence, in terms of the model (jj), a suboptimal though natural choice for 
each turns out the set of the parents nodes of the j-th process. Observe that according to the polytree 
graph some processes may miss any input, due to the causal property of the considered relationships. 

The above procedures, developed to design simple graphical models of a set of processes, are summarized 
in Table [TJ It is worth highlighting that, once such a model has been derived, the analysis of the connections 
of its j-th node can be possibly exploited in order to design a reasonable input set I(j) of a representation 



4 Topological identification 

See [12] for a comprehensive reading about all the (omitted) mathematical details of this section. 
4.1 Wiener filtering approach 

In this section we rigorously investigate the reliability and correctness of the proposed graphical modeling 
procedure when it is employed to reconstruct the topology of a given network. To this aim, considering the 




(20) 




(21) 
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Undirected graph: 

1 . represent each process as a node of the graph, obtaining the nodes set V 

2 . compute the matrix Dq as in ([T2"j) 

3 . choose the weight matrix of the graph Wy equal to Dq 

4 . compute the Minimum Spanning Tree algorithm on the weights of Wy 

5 . choose as graphical model the Minimum Spanning Tree. 

Directed graph: 

1 . represent each process as a node of the graph, obtaining the nodes set V 

2 . compute the matrix Dq as in (|16p 

3 . choose the weight matrix of the graph Wy as in (|2T))) 

4 . compute the Minimum Spanning Tree algorithm on the weights of Wy 

5 . for each arc choose its direction according to (|2"Tj) 

6 . choose as graphical model the resulting polytree. 



Table 1: Procedures to design simple graphical models. 



above theory, we take into account the class of the ALN models as the unknown network whose topology has 
to be identified. Moreover, we propose a comparison between our approach and the results of the application 
of the standard MISO Wiener theory, which is briefly summarized hereafter. 

Definition 11 (MISO Wiener filter). Let us consider the processes Y and {Xi} i=1 and assume that they 
are wide sense stationary and with zero mean. Then, consider the MISO linear filter 



that describes the process Y by {Xi},* and that minimizes the modelling error 

1 /" +7r 

min E [el = min — / 6 e [u))dw , (22) 
Wi 1 J w, 2vr J_„ x ' y ' 

i— l,...,n i=l,...,n 

n 

e(z) = Y(z) - Wi(z)Xi(z) . (23) 
i=i 

Such a filter is referred to as the MISO Wiener filter. 

Let us now consider the class of Acyclic Linear Networks (ALN), that will be used to highlight the 
correctness of both our graphical method and the MISO Wiener filter in identifying a given topology. Recalling 
the previous definition, we can see a network in such a class as the interconnection of m dynamical linear 
SISO systems, whose output signals {Xj}^ =1 m are linked via relationships described as: 

Xi(z) = ei(z) + Y,i:XieP(Xi) Gu{z)X l {z) , 

(24) 

X n {z) = e n (z) + E l: x, e p(x n ) G m {z)X l {z) , 

where V(Xj) denotes the set of the inputs of the j-th system. According to the convention previously adopted, 
let us describe the links structure of the network by means of a graph, where the j-th node represents the 
j-th system and its output Xj and where the edge between the j-th and the z-th nodes is present, if and 
only if the signal Xj is an input of Xj, or viceversa. Therefore, recalling the ALN definition, we can cast the 
concepts of parent and child directly in terms of input and output of a system. 
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Definition 12. The set V{Xj) of the signals, which are inputs of the j-th one, is referred to as the set of the 
parents of Xj. The setC(Xj) of the signals which have Xj as a parent is referred to as the set of the children 
of Xj . A signal that has no parent is said a root, while a signal without any children is said a leaf. 

Observe that in the above network (pM)) the disturbances {ej} . =l n are a set of jointly uncorrelated, zero- 
mean and wide-sense stationary processes, as defined in the ALN definition. Moreover, define the transfer 
function set Q = {Gji(z)} i . =1 n so that Gji(z) ^ if and only if X; G V(Xj). An analogous reasoning can 
be performed for the path concept. 

Definition 13. Given two signals Xj and Xk, if it exists a (unique) sequence Tjk = { T i\i = \ m > with T\ = j 

and r m = k, representing m > 1 signals X Ti , such that Hjk(z) = YYiLi 1 G Ti T i+1 (z) ^ 0, then Tjk is said a 
(directed) path of length I = m — 1 between Xj and Xk and the related Hjk(z) is the total transfer function 
associated to it. 

In the following it will be useful to define Ha(z) = 1 for any value of i. Because of the polytree structure, 
developing each Xj by iteratively substituting ([Ml) in the expression of its components, we have that the 
node Xj itself never appears in the right hand side of the expression. In fact, the above development leads 
to describe every signal via disturbances only: 



Xj = ^Hj l {z)e i 



(25) 



Without loss of generality, denote Y = X n and let us formulate the MISO Wiener filtering problem (|23[) - 
(|2"2"|) regarding the modelling of Y via the processes {Xj}._^ n _ v To this aim, let us define the following 
quantities 

E(z) = [e 1 (z) ... e n (z)f , 
W{z) = [Wxiz) . . . W n ^(z)] T , W n (z) = 0, 
K(z) = [H ln (z) . . . H nn (z)f = = [K^z) . . . K n (z)f , H nn (z) = 1 



H{z) = 



Hn(z) 



-ffln-l(z) 



H n i(z) ... H nn -i(z) 

Therefore, in the minimization problem (|22l) the term (|23[) can be rewritten as 

e(z) = K T {z)E(z) - W T {z)H{z)E{z) = E T {z) (K(z) - H{z) T W(z)) . 

Observe that E T (z) (K(z) - H(z) T W(z)) is a linear combination of the signals ej, which are uncorrelated. 
Hence, the problem boils down to 



mm 

w 



/+7r n 
E 

1 = 1 



K 3 {u)-Y,H ij {uj)W 1 {uj) 



4> ej (uj)duj , 



(26) 



which is formally equivalent to solve the linear least square problem 



min \\K(u)) - H (u)Y W(u)\ 
w 11 1 



(27) 
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being ||-|| the euclidean norm. Intuitively, then, the signals ej act as a "basis" for the processes Xj and Y, 
being the incorrelation property formally analogous to orthogonality. A so particular formulation will lead to 
futher developing in the following sections. This latter formulation of the problem, suitably combined with 
concepts borrowed from Bayesian Network theory [T5], can be fruitfully exploited to investigate the results 
obtained via direct application of the MISO Wiener filtering to the topological identification problem. 

Definition 14. The Markov Blanket of the node Y of a polytree network is the ensemble of all the parents 
and children ofY and all the parents of nodes which are children ofY. 

Proposition 15. The MISO Wiener filter between Y and all the other processes {Xj}.^ n _ 1 may have 
non zero components, i. e. different form the zero transfer function, only with respect to the signals belonging 
to its Markov Blanket. 

Then, since the parents of Y have distance equal to one from the parents of its children, the following 
result holds. 

Corollary 16. The undirected version of the topology of a linear polytree network defined as in (|24p can be 
exactly singled out via multiple applications of the MISO Wiener filter only by using also the corresponding 
distance matrix (|12p to purge the parents of the children of the considered node. 

4.2 The local coherence-based approach 

In the following we investigate the results obtained by the application to ALNs of the graphical modeling 
procedure introduced in the previous sections. Moreover, a comparison with the Wiener filtering approach 
will be provided in order to highlight the advantages of our method. 

To this aim, let us consider a network of interconnect linear systems of the form (|24j) . also assuming that 
their link structure is described by a polytree topology. Consider the non-causal distance matrix (|12|) and for 
each possible tuple k) assume that it exists an arbitrary small interval I such that 

&c,x,(w)6*M^o Vwei. (28) 

Then, the following result holds. 

Proposition 17. If the i-th and j-th systems are actually linked in the real network, i.e., if Xi is an input of 
Xj or viceversa, then the indirect edge connecting the i-th and the j-th nodes belongs to the MST computed 
via the non- causal distance matrix (|12p . 

Corollary 18 (Main result). If condition (|28[1 is satisfied, then the MST computed on Dq provides the 
indirect topology of the original polytree network. 

It is interesting to observe that for polytree linear network the proposed technique, based on the com- 
putation of the MST with respect to the weights matrix Dq , provide the exact identification of the indirect 
version of the original topology, while the application of the Wiener filter does not. 

5 Compressive sensing 

See [22] for details. 

5.1 Preliminary considerations 

Recently, in [23] and [3] interesting equivalences between the identification of a dynamical network and a lo 
sparsification problem are highlighted, suggesting the difficulty of the reconstruction procedure [3J. 
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The main idea is to cast the problem as the estimation of a "sparse Wiener filter". Recalling the problem 
formulation in the previous sections, given a set of N stochastic processes X — {x\, x n }, we consider each 
Xj as the output of an unknown dynamical system, the input of which is given by at most rrij stochastic 
processes {x aj t , x aj m . } selected from X \ {xj}. The choice of {x aj ± , x aj m . } is realized according to a 
criterion that takes into account the mean square of the modeling error. The parameters rrij can be a priori 
defined, if we intend to impose a certain degree of sparsity on the network, or a strategy for self-tuning can be 
introduced penalizing the introduction of an additional link if it does not provide a significant reduction of the 
cost. For any possible choice of {x aj 1 , x aj m . }, the computation of the Wiener Filter leads to the definition 
of a modeling error which is a natural measure of how the time series {x aj ± , x a . m } can "describe" the 
output Xj in terms of predictive/smoothing capability. Once this step has been performed, each system is 
represented by a node of graph and, then, the arcs linking any x aj mfc to Xj are introduced for each node Xj. 
At the end of this procedure a graph, representing the network topology, has been obtained. 
We will show that this way of casting the problem has strong similarities with ^-minimization problems, 
which have been a very active topic of research in Signal Processing during the last few years. Indeed, a ^o - 
minimization problem amounts to finding the "sparsest" solution of a set of linear equations [4] . Unfortunately, 
with no additional assumptions on the solution, the problem is combinatorially intractable [4]. This has 
propelled the study of relaxed problems involving, for example, the minimization of the norm-1 which is a 
convex problem and it is known to provide solutions with at least a certain order of sparsity |23| . We exploit 
the similarity of the two problems borrowing some algorithmic tools which have been developed in the area 
of Signal Processing adapting them to our needs. We start introducing a pre-Hilbert space for wide-sense 
stochastic processes, where the inner product defines the notion of perpendicularity between two stochastic 
processes. This allows us to seamlessly adopt well-known greedy algorithms for a (suboptimal) solution of the 
problem, such as Matching Pursuit (MP) or Orthogonal Least Squares (OLS), which are based on iterated 
projections. 

5.2 Pre-Hilbert space 

Hereafter a method to build up a pre-Hilbert space for stationary random processes is presented. The 
mathematical details are omitted for the sake of simplicity (see |22| ) 

Definition 19. Let e(i) := (ei(t), ... e]y(t)) T a N -dimensional time- discrete, zero-mean, wide-sense station- 
ary random process defined for t G Z, such that, for any i,j G {1, ...,N}, the power spectral density $ eie . (2) 
exists on the unit circle \z\ = 1 of the complex plane and is real-rational. Formally, we write 

with A(z), B(z) real coefficient polynomials such that B(z) ^ for any z£C, \z\ = 1. 
We say that e is a vector of rationally related random processes. 

Definition 20. Define the sets 

J- := {W{z)\W{z) is a real-rational scalar function of z G C defined for \z\ = 1} 
0jrmxn _ < w < z ^ w r^ g C ™x« and any f its en tries is in °J 7 }. 

Proposition 21. The ensemble ( °J-e, +, -,M) is a vector space. 

Definition 22. We define a scalar operation < •, • > on °J r e in the following way 

< xi,x 2 >:= R Xl x 2 (0)- 

Proposition 23. The set °J-e, along with the operation < ■, ■ > is a pre-Hilbert space (with the technical 
assumption that x\ and Xi are the same processes if x\ ~ x 2 ). 
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Definition 24. For any x G °J r e we denote the norm induced by the inner product as 

\\x\\ :=< x, x > . 

We provide an ad-hoc version of the Wiener Filter (guaranteeing that the filter will be real rational) with 
an interpretation in terms of the Hilbert projection theorem. Indeed, given signals y,x\, ...,x n G° J-e, the 
Wiener Filter estimating y from x :— (xi,...,x n ) can be interpreted as the operator that determines the 
projection of y onto the subspace °J 7 x. 

Proposition 25. Let e be a vector of rationally related processes. Let y and x\,...,x n be processes in the 
space °J 7 e. Define x := (x\, ...,x n ) T and consider the problem 

If$> x (u) > 0, for all bj G [— 7r,7r] 7 the solution exists, is unique and has the form 

W{z) = * vx (z)$ xx (z)- 1 . 

Moreover, for any W'(z) 6 °J rlxn x, it holds that 

< y-W(z)x,W'(z)x >=0. (30) 

Let e be a vector of rationally related processes. Consider a set X := {x\, ...,x n } C °J 7 e of n rationally 
related processes with zero mean and known (cross)-power spectral densities <fr XiXj (z). For any given process 
xj, with j G {1, n}, fix rrij G N such that < rrij < n — 1. We intend to identify the rrij processes x a k , 
a j.k ^ j, with k = l,...,m,j, which, filtered by suitable rational transfer functions Wj tCCj k (z), provide the 
best estimate of Xj in the sense of the mean squares. The mathematical formulation of the problem is the 
following: 

2 

(31) 

where every Wj^ aj k (z), with k = 1, mj, is a possibly non-causal transfer function. Given any set {ptj,kYkLii 
the above problem is immediately solved by a multiple input Wiener filter. It is the determination of the 
parameters aj t k that makes the problem combinatorial. 

The signals x a . k G X represent the signals with the "strongest affinity" with Xj according to a least 
square criterion. Moreover, as previously observed, the result of this optimization problem lends itself to a 
natural interpretation in terms of Graph Theory. Recalling the approach of the previous sections, assume 
that the processes Xj, j — 1, ...,n, are represented respectively by the nodes Nj, j = 1, n, in a graph. For 
j = 1, ...,n, draw an oriented edge from each of the nodes N a k , k = l,...,m,j, to the node Nj. Following 
this procedure, the obtained edge set can be readily visualized and provides a qualitative description of the 
internal structure of the whole system in terms of the most relevant linear relations. If the goal is to model 
a complex system as a network of suitable agents, the parameters m/s, defining the maximum number of 
entering edges for the nodes Nj's, can be used to adjust the degree of complexity of the graph. Indeed, when 
uij = n — 1 for any j — 1, n, we expect to obtain a complete graph, that is all possible links will be present. 
Conversely, a different choice for the parameters m/s will lead to a reduction of complexity of the identified 
network model. 

5.3 Links with compressive sensing 

In this section we highlight the connections between the problem of reconstructing a topology and the com- 
pressive sensing problem. Such a connection is possible because of the pre-Hilbert structure constructed 



mm 



k=l 
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before. Indeed, the concept of inner product defines a notion of "projection" among stochastic processes and 
makes it possible to seamlessly import tools developed for the compressive sensing problem in order to tackle 
that of identifying a topology. 

In the recent few years sparsity problems have attracted the attention of researchers in the area of Signal 
Processing. The reason is mainly due to the possibility of representing a signal using only few elements 
(words) of a redundant base (dictionary). Applications are numerous, ranging from data-compression to 
high- resolution interpolation, and noise filtering [61 135). 

There are many formalizations of the problem, but one of the most common is to cast it as 

min IIieo — ^uH^ subject to ||n?||o < m (32) 

w 

where n < p; xo € R p ; W € R px " is a matrix, whose columns represent a redundant base employed to 
approximate xq, and the "zero- norm" (it is not actually a norm) 

H|o:=|{*eNK^O}| (33) 

is defined by the number of non-zero entries of a vector w. It can be said that w is a "simple" way to express 
xo as a linear combination of the columns of 'J, where the concept of "simplicity" is given by a constraint on 
the number of non-zero entries of w. 
For each j — 1, n, define the following sets: 

W« = {W(z) G °T lxn \Wj(z) = 0} , (34) 

where Wj(z) denotes the j-th component of W(z). For any W £ define the "zero-norm" as 

||W"||o = {# of entries such that 3 z e C, Wi{z) + 0} 

and define the random vector 

x = (xi, ... 7 x n ) T . (35) 

The problem (|3"Tj) can be formally cast as 

\\xj - Wx\\ 2 subject to ||W|| < m , (36) 

which is, from a formal point of view, equivalent to the standard Iq problem as defined in (|32l) . 



5.4 Solution via suboptimal algorithms 

The problem of network identification/complexity reduction we have formulated in this paper is equivalent 

to the problem of determining a sparse Wiener filter, as explained in the previous section, once a notion of 

orthogonality is introduced. This formal equivalence shows how the problem of identifying a topology can 

immediately inherit a set of practical tools already developed in the area of compressing sensing. 

Here we present, as illustrative examples, modifications of algorithms and strategies, well-known in the Signal 

Processing community, which can be adopted to obtain suboptimal solutions to the problem of identifying a 

network. 

While formally identical to (|3"2"]l . the problem of identifying a topology as cast in (|36[) still has its own 
characteristics. Since the "projection" procedure in (|36[) is given by the estimation of a Wiener filter, it is 
computationally more expensive than the standard projection in the space of vectors of real numbers. 
For this reason greedy algorithms offer a good approach to tackle the problem since speed becomes a funda- 
mental factor. Moreover, since the complexity of the network model is the final goal, greedy algorithms are 
a natural solution allowing one to specify explicitly the connection degree rrij of every node Xj . This feature 
is in general not provided by other algorithms. 

As an alternative approach to greedy algorithms we also describe a strategy based on iterated reweighted 
optimizations as described in [3]. 
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Figure 1: The tree structure obtained using a coherence-based analysis. Every node represents a stock 
and the color represents the business sector it belongs to. The considered sectors are Basic Material 
(yellow), Conglomerates (white), Healthcare (pink), Transportations (dark blue), Technology (red), 
Capital Goods (orange), Utilities (brown tints), Consumer (violet tints), Financial (green tints), Energy 
(grey tints) Services (light blue tints). Using the industry classification given by Google, the Financial 
sector has also been differentiated among Insurance Companies (light green), Banks (average green) and 
Investment Companies (dark green); Services have been divided in Information Technology (cyan) and 
Retail (aquamarina) , Consumer in Food (plum) and Personal-care (purple) ; Energy in Oil & Gas (dark grey) 
and Well Equipment (light grey); Utilities in Electrical (dark brown) and Natural Gas (light brown). The 
complete stock list is reported in Table [5] 

6 Application to clustering and graphical modeling 
6.1 High-frequency stock market analysis 

A collection of 100 stocks of the New York Stock Exchange has been observed for four weeks (nineteen market 
days), in the lapse 03/03/2008 - 03/28/2008 sampling the prices every 2 minutes. The stocks have been chosen 
as the first 100 stocks with highest trading volume according to the Standard & Poor Index at the first day 
of observation. An a priori organization of the market has been assumed in accordance with the sector and 
industry group classification provided by Google Finance®, that is also the source of our data. We underline 
that in this paper we are mainly concerned with sectors, but in some cases we have also taken into account the 
industry group classification to refine the results. The whole observation horizon spans the month of March. 
We have not considered reasonable the hypothesis that the mutual influences among the companies are 
stationary during this time period. However, it is worth observing that the market session data are naturally 
divided into subperiods, namely weeks and days and discontinuities are present. Indeed, due to the pre- and 
post-market sessions, there is a discontinuity between the end value of a day and the opening price of the next 
one. Therefore, in our analysis, we have followed the natural approach of dividing the historical series into 
nineteen subperiods corresponding to single days. Moreover, we have assumed the working hypothesis that 
the relations between the companies can be approximated with stationary ones during a single day. Hence, 
we have performed the multivariate analysis for each subperiod computing the coherence-based distance (1111) 
among the stocks. Finally, we have averaged such distances over the whole observation horizon and the 
related results have been exploited to extract the MST, providing the corresponding market structure. We 
find useful to remark that the computation of the distances for small data sets is also computationally better 
performing. The corresponding results are illustrated in Figure [TJ Every node represents a stock and the 
color represents the business sector or industry it belongs to. We note that the stocks are quite satisfactorily 
grouped according to their business sectors. We stress that the a priori classification in sectors is not a 
hard fact by itself and we are not trying to match it exactly. A company could well be categorized in a 
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sector because of its business, but, at the same time, could show a behaviour similar to and explainable 
through the dynamics of other sectors. Actually, we would be very interested into finding results of this kind. 
Indeed, in those very cases, our quantitative analysis would provide the greatest contributions detecting in 
an objective way something which is "counter-intuitive". Thus, we just use such a priori classification as a 
tool to check if the final topology makes sense and if, at a general level, the coherence approach performs 
better. Despite this disclaim, it is worth noting that the Financial (green tints), Consumer (violet tints), 
Basic Materials (yellow), Energy (grey tints) and Transportation (dark blue) sectors are all perfectly 
grouped, with no exceptions. We note a subclusterization of the Financial sector, as well. The Consumer 
sector shows another prominent subclusterization in the Food (plum) and Personal/Healthcare (purple) 
industries, while the Energy sector presents an evident subclusterization into the Oil & Gas (dark grey) 
and Oil Well Equipment (light grey). The close presence of companies of a different sector and industry, 
Utilities/Natural Gas (light brown) is observed as well. The other Utilities/Electricity companies 
(dark brown) are, interestingly, a different group. We also observe a big cluster of companies classified as 
Services (light blue tints). We have differentiated them in the two industries Retail and Information 
Technology using two slightly different colors, respectively aquamarine and cyan. We also note the presence 
of three Services companies which are isolated from the other ones: V [Verizon], T [AT&T], and S [Sprint]. 
All of them are telephone companies. This might suggest that this industry should show at least a slightly 
different dynamics from the other service companies. Note also how the Technology sector (red) is almost 
perfectly grouped and how IBM, an IT company, even though classified as a Services company, is located in 
it. Finally, the two only automobile companies GM and F [Ford] happen to be linked together. The analysis 
of this four weeks of the month of March cleanly shows a taxonomic arrangement of the stocks even though 
the choice of a tree structure might have seemed quite reductive at first thought. 



6.1.1 Benefits of the dynamical modeling 

In this section we highlight the improvements of the dynamical approach, i.e. the coherence-based distance 
(fTTj) . against the static procedure, i.e. the traditional correlation-based distance used, for example, in |18) . 
However, the application of the correlation-based analysis to the entire data set would be meaningless, since 
the stocks are sampled at high frequency over a long observation horizon. Hence, to avoid the non-stationary 
problem, we perform the procedure in the daily subperiods, averaging the corresponding distances. Figure^ 
illustrates a comparison between the "heat map" corresponding to the correlation-based distance matrix and 
the coherence-based one. The energy of the error associated to the dynamical modeling approach is expected 
to be lower than the static one. Conversely, the correlation-based distance is expected to be higher than 
the coherence-based metric. Such a property is highlighted in Figure [2] by the darker looking of the map (b) 
against map (a) . Indeed, it shows that the dynamical modeling approach is able to describe a larger variety of 
dependencies, especially relations involving the presence of delays. The Spearman index has been evaluated 
from these set of data obtaining a sp ~ 0.158. 

The MST derived from those correlation-based distances is illustrated in Figure [3J At a very first look, 
a lesser capability of grouping the stocks according to their sectors already appears. Moreover, a further 
analysis reveals, for instance, that the subclusterization of the Financial sector highlighted in the dynamical 
approach is no more present in the correlation-based MST. The same happens for the subclusterizations of 
the Consumer and Energy sectors. 

The higher consistency of the coherence-based metric can be also underlined considering the time evolution 
of the single distances. In Figure H] we have considered a test node belonging to the Technology sector and 
we have reported along the time horizon both its distances with respect to other Technology stocks and the 
distances related to the Financial ones. As expected, the dynamical modeling approach is more consistent 
and robust in detecting similarities among stock time series belonging to the same sectors. 
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Figure 2: A graphical representation of the averaged correlation (a) and coherence (b) distance matrices. 
Any distance matrix has 100 x 100 entries representing the distances of any possible couple of the 100 stocks 
in our analysis. These values have been represented using a grey scale: the lighter the spot, the large the 
distance between the two stocks. The interpretation of both the distances in terms of a modeling error shows, 
as expected, a better modeling capability for the dynamic approach. This can be considered an evidence for 
the presence of propagative phenomena in the network, at the considered time scale of 2 minutes. 




Figure 3: MST associated to the correlation-based metric. The distances have been computed in the daily 
subperiods and then averaged as in Figure [T] An increased number of stocks grouped with others belonging 
to a different sector is experienced. For instance, HPQ [Hewlett-Packard] is no more grouped with the other 
Technology stocks. One might also observe that GM and F are not directly linked or that the Consumer stocks 
(cyan) are almost spread over the whole graph. Furthermore, the subclusterization of the Financial sector, 
highlighted in the Figure Q] is not present. 
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Figure 4: The distances of 5 financial stocks (blue) and 5 technological stocks (red) from a sample technological 
stock have been computed on a daily base. The figure represents both the correlation distances (a) and the 
coherence ones (b). As it can be noted, the correlation distance is more consistent and robust in detecting 
similarities among stocks belonging to the same sectors. 

6.2 Graphical models of European climatic regions 

In this section we provide an illustrative example to show the effectiveness of the proposed graphical modeling 
approach (see |I2| for a further reading on this case). To this aim, we report the results obtained by analyzing 
a set of real data. A crucial assumption lies in the adoption of linear models for the modeling process. Indeed, 
the application of our technique to real data is expected to be able to provide information about the linear 
component of the dynamics, while the nonlinear part will necessarily be embedded as noise. In other words, 
the corresponding results are expected to be as worse as the nonlinear dynamics prevail in the real network. 
However, though a linear analysis may be not sufficient to fully understand the process dynamics, we highlight 
that it provides a first step to derive information about the system, especially about its internal connections. 
Indeed, we remind that no a priori knowledge is assumed, thus ours is a fully blind approach. In general, 
it is possible to incorporate a priori knowledge into the modeling procedure if, for example, the dynamics is 
known to have a certain structure. However, the development of similar procedures goes beyond the scope 
of this paper and it will be matter of future research. 

In the following we address a collection of meteorological time series sampled in 50 sites located in as much 
European towns, listed in Table [3J Our main goal is obtaining information about the similarities among the 
temperatures in those towns, or, rather, deriving a suitable topological model. 

The temperatures in the above locations are observed over a one year time horizon covering the whole 1988. 
Such time series are hourly sampled, providing a high frequency data set 0. Notice that consequently both 
daily and seasonal trends may be pointed out from the observations. To this regard, any deterministic trend 
has to be removed from the original time series. Hence, let {Yk(n)} n=1 g76 oi k = 1, . . . , 50, be the hourly 
sampled temperatures in the considered towns. Any trend different from the daily one can be singled out 
just by observing the mean value over a 24 hours span, i.e. 



assuming ifc(n) = 0ifn<lorn> 8760 as a working hypothesis. Therefore, we provide the rejection of 
such seasonal trends just by defining the new time series 




n = 1,...,8760, k = 1,...,50, 



i=-12 



X k {n) :=Y k (n)-S k (n), fc = l,...,50. 



1 International Surface Weather Observations 1982-1997, Volume 3 (Europe) , 1998 Asheville, N.C 
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Table 2: List of the companies considered in the analysis. 

1 Code I S fir tor ] | Name | OnT 



3M Company 


MMM 


Conglomerates 


Abbott Laboratories 


ABT 


Healthcare 


Aes Corporation 


AES 


Utilities 


Alcoa Inc. 


AA 


Basic Materials 


Allegheny Technologies Inc. 


ATI 


Basic Materials 


Allstate Corporation 


ALL 


Financial 


Altria Group 


MO 


Consumer Non- Cyclical 


American Electric Power 


AEP 


Utilities 


American Express 


AXP 


Financial 


American International Group 


AIG 


Financial 


Amgcn Inc. 


AMGN 


Healthcare 


Anhcuser Busch 


BUD 


Consumer 'Non- Cyclical 


Apple Inc. 


AAPL 


Technology 


AT&T 


T 


Services 


Avon Products 


AVP 


Consumer Non-Cyclical 


Baker Hughes Inc. 


BHI 


Energy 


Bank of America 


BAC 


Financial 


Bank of New York Mellon 


BK 


Financial 


Baxter [international 


BAX 


Healthcare 


Boeing 


BA 


Capital Goods 


Bristol Mvlth Squibb 


BMY 


Healthcare 


Burlington Northern Santa Fe 


BNI 


Transportation 


Campbell Soup 


CPB 


Consum e r Non-Cyclical 


Capital tine Financial 


COF 


Financial 


Caterpillar Inc. 


CAT 


Capital Goods 


CBS 


CBS 


Services 


Chevron 


CVX 


Energy 


CIGNA 


CI 


Financial 


Cisco Systems 


CSCO 


Technology 


Citigroup Inc 


c 


Financial 


Clear Channel Communications 


ecu 


Services 


Coca-Cola 


KO 


Consumer Non- Cyclical 


Colgate Palmolive 


CL 


Consumer Non-Cyclical 


Comcast 


CMCSA 


Services 


Conoco Phillips 


COP 


Energy 


Covidicn 


cov 


Healthcare 


CVS Caremark 


CVS 


Services 


Dell Inc 


DELL 


Technology 


Dow Chemical Company 


DOW 


Basic Materials 


E.I. du Pont de Nemours 


DD 


Basic Materials 


El Paso 


EP 


Utilities 


EMC 


EMC 


Technology 


Entergy 


ETR 


Utilities 


Exclon 


EXC 


Utilities 


Exxon Mobil 


XOM 


Energy 


FedEx 


FDX 


Transportation 


Ford Motor 


F 


Consumer Cyclical 


General Dynamics 


GD 


Capital Goods 


General Electric 


GE 


Conglomerates 


General Motors 


GM 


Consumer Cyclical 



Goldman .Sachs Group 


GS 


Financial 


Google Inc. 


GOOG 


Technology 


Halliburton 


HAL 


Energy 


Hartford Financial Services 


HIG 


Financial 


H. J. Heinz 


HNZ 


Consumer Non-Cyclical 


Hewlett-Packard 


HPQ 


Technology 


Home Depot 


HD 


Services 


Honeywell International 


HON 


Capital Goods 


Intel 


INTC 


Technology 


International Business Machines 


IBM 


Services 


International Paper 


IP 


Basic Materials 


Johnson & Johnson 


JNJ 


Healthcare 


JPMorgan Chase 


JPM 


Financial 


Kraft Foods 


KFT 


Consum e r Non-Cyclical 


Lehman Brothers Holding 


LEH 


Financial 


McDonald's 


MCD 


Services 


Medtronic 


MDT 


Healthcare 


Merck 


MRK 


Healthcare 


Merril Lynch 


MER 


Financial 


Microsoft 


MSFT 


Technology 


Morgan Stanley 


MS 


Financial 


Norfolk Souther Group 


NSC 


Transportation 


NYSE Euroncxt 


NYX 


Financial 


Oracle 


ORCL 


Technology 


Pcspi 


PEP 


Consum e r Non-Cyclical 


Pfizer Inc. 


PFE 


Healthcare 


Procter & Gamble 


PG 


C'< >n.~uiner ■ Non- Cyclical 


Raytheon 


RTN 


Conglomerates 


Regions Financial 


RF 


Financial 


Rockwell Automation 


ROK 


Technology 


Sara Lcc 


SLE 


0>ii-mm>r,Xon-CyclicaI 


Schlumbcrgcr Limited 


SLB 


Energy 


Southern 


SO 


Utilities 


Sprint Nextel 


S 




Target 


TGT 




Texas Instruments Inc. 


TXN 


Technology 


Time Warner 


TWX 


Services 


Tyco International 


TYC 


Conglomerates 


U. S. Bancorp 


USB 


Financial 


United Parcel Service 


UPS 


Transportation 


United Technologies 


UTX 


Conglomerates 


UnitedHealth Group Inc. 


UNH 


Financial 


Verizon Communications 


VZ 




Wachovia 


WB 


Financial 


Wal-Mart Stores 


WMT 




Walt Disney 


DIS 




Wells Fargo 


WFC 


Financial 


Weyerhaeuser Company 


WY 


Basic Materials 


Williams Companies 


WMB 


Utilities 


Xerox 


XRX 


Technology 



This technique to detrend data is analogous to the procedure reported in [25]. It is worth noticing that the 
new time series means are approximately equal to zero, since the temperature variation over a 24 hours time 
span is fairly distributed around the mean value, as expected by such time series. Observe that also this 
daily trend should be assumed as a deterministic periodic component, but, due to the nature of the data, its 
range is expected to vary along the year. Theoretically, we could apply a "windowed" approach to point out 
how the daily behavior changes, but the choice of the related time length should be derived by the seasonal 
trends. Therefore, for the sake of the simplicity, in this example we just neglect to remove the daily trend, 
considering it as part of the stochastic variation of the signal. 

In the following analysis we consider a given data set and we do not require any on-line computation. 
Moreover, only the non-causal approach will be presented, since the causal one just differs for the numerical 
tools used to compute the distance matrix. 

The coherence functions of the detrended time series {^fc}fc=i ) ... ) 5o are numerically evaluated by employing 
the Welch algorithm [M]. Then, the corresponding distances are computed according to (fTTj) . providing a 
quantitative description of similarities and connections among the temperatures in the considered towns 
during the year 1988. Such information can be exploited to choose the best source of each time series 
according to the minimum modeling error approach and to derive a connected tree topology according to the 
procedure summarized in Table [TJ The corresponding graphical model is illustrated in Figure [5l 
Even though the approach is blind and it does not exploit any a priori knowledge, the modeled topology 
absolutely turns out reasonable. Towns belonging to the same climatic region are correctly linked together, 
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Aberdeen 


ABR 


Bordeaux 


BRX 


Berlin 


BRL 


Barcelona 


BRC 


Belfast 


BLF 


Brest 


BRE 


Bremen 


BRE 


Granada 


GRN 


Birmingham 


BRM 


Chartres 


CHR 


Dresden 


DRS 


Madrid 


MDR 


Bournemouth 


BMT 


Dijon 


DJN 


Dusseldorf 


DSS 


Malaga 


MLG 


Bristol 


BRS 


Lyon 


LYN 


Frankfurt Am Main 


FRN 


Salamanca 


SLM 


Cardiff 


CRD 


Marseille 


MAR 


Hamburg 


HMB 


Santiago De Compostela 


SDC 


Carlisle 


CRL 


Montpcllier 


MNT 


Hannover 


HAN 


Sevilla 


SVL 


Edinburgh 


EDN 


Nancy 


NCY 


Heidelberg 


HDL 


Valencia 


VAL 


Exeter 


EXE 


Nantes 


NAN 


Kiel 


KIE 


Zaragoza 


ZAG 


Glasgow 


GSW 


Nice 


NIC 


Leipzig 


LPZ 






London 


LND 


Orleans 


ORL 


Mannheim 


MNH 






Plymouth 


PLY 


Paris 


PRS 


Munich 


MUN 






Manchester 


MAN 


Strasbourg 


STR 


Stuttgart 


STG 






Nottingham 


NOT 


Toulouse 


TOU 











Table 3: The 50 European towns considered in Section and their labels. 




Figure 5: Solid plus dashed lines: the tree topology obtained by the application of the undirected procedure 
of Table Q] for the towns reported in Table [3] Solid lines: connections corresponding to the minima of each 
row of the undirected distance matrix. 

also forming consistent clusters in term of the links associated only to the minima of the rows of the distance 
matrix. Moreover, some connections representing the propagative effects of the weather are provided, as well. 
Hence, the corresponding topology appears consistent with the problem of deriving a reduced optimal model 
of the whole system. For instance, observe that, according to our approach, in order to derive a suitable 
reduce linear model to describe the temperatures in Edinburgh, we should consider a MISO linear hlter with 
the temperatures of Aberdeen, Carlisle and Glasgow as inputs. 
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